\chapter{Introduction}

\section{The Quantum Information Revolution}
\begin{quote}{Information is Physical. \emph{Rolph Landauer}}
\end{quote}
The idea that information is a physical quantity, along with 
the related idea that all physical systems store, manipulate and
process information, have fundamentally changed
our view of the physical world.  The language and the methods of
information science now permeate the physical sciences,
while the fundamental physical limitations on computing are beginning
to have an impact on computer science as the size of computer chips
approach the scale of atoms and molecules.  While all branches
of the physical science have
absorbed and benefited from information-theoretic ideas, few have been
so deeply affected by them as quantum mechanics.

Quantum mechanics, it turns out, is not only a theory of physical
systems, subatomic particles,
atoms, molecules, crystalline solids and so on; it is also a theory
about information.  Atoms and molecules, when static,
store information and when engaged in physical processes can be
thought of as little computers.  Moreover, it is possible to abstract away the
information-theoretic part of quantum mechanics from its embodiment in
particular physical systems, to talk of quantum bits and unitary logic
gates instead of atoms and Hamiltonians.  Some researchers even hope that
the postulates of quantum mechanics can be entirely replaced by purely
information-theoretic ideas about the fundamental limits
placed by Nature on our ability to encode, decode and transmit
information\cite{Brassard}. 

This reinvention of quantum mechanics as a theory of information has
led to some startling discoveries in the past twenty years.  Most
notably it has revealed that computers that store and manipulate
quantum systems can be exponentially more powerful than ordinary
computers\cite{Shor} and that they can work even in the presence of errors\cite{Shor}.  It
has shown us that information stored in quantum systems, unlike
ordinary information, cannot be copied.  It can, however, be `teleported'
from one system to another, through processes that destroy the
information in the original system and transfer it to the other.
Furthermore, the peculiar way quantum states can be
perfectly correlated while being perfectly random can permit communications to be made completely
secure from eavesdropping, something that is viewed as
impossible in classical information theory.   

In concert with these changes in our conceptual understanding of
quantum mechanics, our ability to manipulate, control and measure
all aspects
of quantum systems has improved enormously.  Single atoms can now be
reliably trapped, manipulated and interacted\cite{Wineland}.  Single
photons can be
generated at will, either alone or in highly correlated multi-photon
states\cite{Kwiat}.  Single atoms can be made to interact with single photons with
exquisite control\cite{Kimble}.  And large many-body systems can be created and
interacted completely coherently with a high degree of control over
the details of the interaction\cite{Bloch}.  These advances have
permitted the development of an experimental branch of quantum information science and to a great
many proof-of-concept demonstrations of the main ideas of quantum
information theory and fundamental quantum mechanics.  A partial
list of some of these landmark experiment include the first
unambiguous violation of Bell's inequalities\cite{Aspect}, the
first demonstration of secure quantum
cryptography\cite{Brassard}, the first demonstration of
teleportation\cite{zeilinger}, the demonstration of quantum
logic gates\cite{white} and simple quantum
algorithms\cite{kwiat,white}.  At the same time, these
developments have created a need for a fuller understanding
of how to connect the information gleaned from quantum measurements to the
information inherent in the quantum state.  It is the main aim of this
thesis to contribute towards this important area of research.

To the extent that a quantum state is about
information rather than physics, the interaction between the
physical system and the information it contains is mediated via
measurement.  Measurement has always been a deep concern in quantum
mechanics, assuming a leading role in Heisenberg's uncertainty
principle, the postulates of quantum mechanics and various thought
experiments such as the EPR experiment\cite{EPR}, Bell's
inequalities\cite{Bell} and the Kochen-Spekker theorem\cite{Kochen-Spekker} which formalise the
most irreconcilable differences between the classical and quantum
world.  Measurement also serves as a lifeline connecting the
abstract Hilbert spaces in quantum dynamics appear to occur and the
real world of numerical physical quantities.  As Niels Bohr put
it,
\begin{quote}
On the one hand, the definition of a physical system, as
ordinarily understood, calims the elimination of all external
disturbancs.  But in that case, according to the quantum
postulate, any observation will be impossible, and, above all,
the concepts of space and time lose their immediate sense.  On
the other hand, if in order to make observation possible we
permit certain interactions with suitable agencies of
measurement, not belonging to the system, an unambiguous
defintion of the state of the system is naturally no longer
possible, and there can be no question of causality in the
ordinary sense of the word.  The very nature of the quantum
theory thus forces us to regard the space-time co-ordination and
the claim of causality, the union of which characterizes the
classical theories, as complementary but exclusive features of
the description, symbolizing the idealization of observation and
definition respectively.
\end{quote}

While quantum systems cannot be viewed as having real properties
that are independent of measurement, they can, nevertheless be
characterized if enough identical copies are made of them.  This
characterization is conceptually similar to the way that a large
number of systems obeying classical mechanics with unknown
individual properties can nevertheless be characterized by their
thermodynamic properties arising from the fact that each
particle can be described by the same probability distribution. 

While in some quantum systems characterization is straightforward, in others, the effects of
interactions, indistinguishability, and unaccounted degrees of freedom
can make the question `What is the quantum state of my system?' an
extremely thorny one to even phrase in terms of clear experimental
observables, let alone to answer.  Chapter 3 of this thesis examines just such a case
where the indistinguishability of photons makes their quantum state
impossible to describe accurately.  By instead concentrating on the
observables which it is possible to measure we arrive at a complete,
elegant and scalable method of characterizing and measuring these
states.
  
The in this thesis draws on the notion of
information as a physical quantity in the area of experimental quantum
state characterization.  Chapter 2 discusses the experimental
techniques used to study quantum states in the lab.   Chapter 3 focuses on the interplay between
the remarkable phenomenon of indistinguishability of quantum particles
and the ability to extract information about a quantum system through
measurement.  In it we develop a complete theory of how much can be
learned about the quantum state of a system of particles that are
indistinguishable to experimental measurements but may or may not be
fundamentally distinguishable.  Chapter 4 examines how to structure a
set of measurements so as to maximize the information extracted with
each one.  On a technical level this leads to better state estimation
when the number of copies of the state is fixed.  This
optimality is deeply connected to the geometry of the Hilbert
space of measurements and helps to connect that geometry to
operationally relevant parameters.  More surprisingly, this
optimal set of measurements leads to a new description of quantum states on a discrete
phase-space instead of a Hilbert space.  This new description has several useful and intuitive
properties that may make it a better way to think about qubit states than
the standard descriptions.  Finally, chapter 5 looks at how some
properties of states that are usually thought to require full
measurement of the density matrix can be obtained directly through
judicious choice of measurements.  This will be an important
technology as quantum systems become larger and the exponential scaling of
the size of the density matrix of a system with the number of
particles it contains makes state tomography impractical. 

\subsection{History}
The single most exciting development in quantum mechanics of the last
thirty years has been its fusion with information science to create
the new discipline of quantum information.  While even in its earliest
days it was understood that quantum mechanics had deep implications
for information and the limits imposed on its acquisition throught the
Heisenberg uncertainty principle, it took several decades of slow
development to realize the very deep implications of information
theory to quantum mechanics and the even more surprising implications
that quantum mechanics has for our understanding of information.

While the links between physics and information theory can be traced
back to Maxwell's demon and Shannon's quantification of information in
terms of an entropy measure, the first researcher to really take
a serious research interest in link between physics and
information was Rolf Landauer.
It was he who first conducted an analysis of computation in the
language of thermodynamics and recognized that physical laws implied
limits for how computers might function.  His work was later developed
by his colleague Charles Bennet who showed that computation could be
undertaken reversibly with an increase in entropy resulting only from
the initial setting of the input to the computation.  Shortly
thereafter Paul Benioff showed that this analysis could be
extended into the world of quantum mechanics and
that a system that evolved unitarily under a Hamiltonian could be
thought of as a computer performing information processing in moving
from one state to another.

The question then became whether a quantum information processor
was fundamentally different than an ordinary computer.  Feynmann, Deutsch, Joscza
and others, addressed this problem by developping an intuitive
rationale for why it might be so along with some toy algorithms\cite{Deutsch-Joscza}
that seemed to be computationally easier to perform in a quantum
system than in a classical one.  Convincing evidence for a real practical
advantage from quantum information processors was only obtained
when Peter Shor proved that quantum algorithms could be used to factor numbers.
After that the funding floodgates were unleashed and quantum
computing entered the physics mainstream.

The interest in quantum computing was part of a broader research
program to understand  

The business of characterizing quantum states began with 

\section{Concepts}
\subsection{The density matrix formulation}
While one often uses state vectors to describe
the quantum state of a system in terms of its wavefunction, in
experimental work (and in many other situations) it is usually
preferable to describe the state using the more general density
matrix formulation.  The density matrix or density operator is a
Hermitian operator on the Hilbert space of wavefunctions.  One
way to think of it is as a probability distribution over
projectors onto different wavefunctions.  That is to say
\begin{equation}
\rho=\sum_i p_i \ket{\psi_i}\bra{\psi_i}
\end{equation}
where the $p_i$ are real numbers on $\left[0,1\right]$ such that
$\sum_i p_i=1$.  The non-negativity of the $p_i$ implies that the
eigenvalues of $\rho$ are non-negative or, equivalently, that
for any column $x$ and row $y$ that
$|\rho_{xy}|=|\rho_{yx}|\leq\sqrt{\rho_{xx}\rho_{yy}}$.  The
off-diagonal elements for which $x\neq y$ are called
\emph{coherences} while the diagonal elements are called
\emph{populations}.  

While the density matrix can be rotated to any basis by applying
unitary operations, we usually choose a basis to be the
preferred basis and call this the \emph{computational basis}.
The computational basis has the property that all of the basis
states are separable, that is to say they can be written as a
tensor product of single-particle states.  This is not true for
all possible density matrices as will be seen in the section on entanglement.

When all coherences take their maximum values, i.e. when
$|\rho_{yx}|=\sqrt{\rho_{xx}\rho_{yy}}$ for all $x$ and $y$,
state is said to be pure.  Only one of the $p_i$ is non-zero,
and the density matrix is a projector onto a single
state-vector.  If all of the coherences are zero for a density
matrix written in the computational basis, the state
can be thought as as a classical mixture and its measurement statistics are
governed solely by ordinary probability theory applied to the
single-particle measurements outcomes.  A \emph{maximally mixed
  state} is a density matrix proportional to the identity
operator.  The purity\cite{MikeandIke} is a useful measure of whether the
state behaves more like a classical statistical mixture or a
more like a pure quantum state.  We define
\begin{equation}
P=\text{Tr}\left[\rho^2\right]
\end{equation}
This measure is invariant under unitary operations.

If $\hat{O}$ is a Hermitian operator on the Hilbert space of
states then the expectation value of the operator for a system
in the state $\rho$ is given by
\begin{equation}
\rho=\text{Tr}\left[ \rho \hat{O}\right]
\label{eq:expectation_values}
\end{equation}

The advantage of the density matrix description is that it is
capable of describing both the statistics of quantum states and
quantum measurement and the classical statistics induced by
experimental randomness.  This makes it the most appropriate
description for experimental data sets.  The density matrix also
has the property that, because it is a Hermition operator, it is
itself an observable.  This makes it possible
to recontruct the density matrix from measurements.   

Like a probability distribution, a density matrix is a
statistical description of a quantum state.  Depending on one's
prefered philosophy one can view it as being `really' a
description of the frequency of observation of certain
measurement outcomes, a state of knowledge about the outcome of
such measurements or a description of reality for a subsystem of
a larger, pure, system\endnote{This would be the DeBroglie-Bohm
  viewpoint}.  These interpretational differences
make identical predictions for the outcome of experiments, of course, but one or the
other may be more convenient in understanding the density matrix
in a particular context. 

\subsection{Qubits}
Among the many great insights in Claude Shannon's 1948 paper
\emph{A Mathematical Theory of Communication}, perhaps the most
significant was that all information-carrying
systems are formally equivalent to a binary system of ones and
zeros, entities that he called bits.  This insight allowed him to
develop his theory of redundancy, data compression,
informational entropy completely independent of the physical
system carrying the information.  

Some of the earliest results in quantum information theory have
to do with the analogue of data compression for quantum
information.  It was in developing this theory that Schumacher first coined
the term qubit as a contraction of quantum bit to describe the
smallest quantity of \emph{quantum} information.  Ever since, the qubit has
been essential to abstracting away the information-theoretic from physical
quantum systems as just as the bit has done for classical information
theory.

A qubit is a quantum two-level system.  It is described by a state vector in a
two-dimensional Hilbert space.  In the quantum information
literature it is typical to take the basis states to be
$\ket{0}$ and $\ket{1}$.  The experiments discussed in this
dissertation all involve a particular implementation of the
qubit, namely the polarization state of a single photon.
Since this is the only type of qubit we will be discussing in
this dissertation we adopt the conventional notation of using
$\ket{H}$ and $\ket{V}$, the horizontal and vertical
polarization states as the basis states for our qubit\footnote{The treatment of polarization as a vector in a two-dimensional
Hilbert space actually dates back to \emph{Jones calculus} invented by
R. C. Jones in 1941\cite{Jones1941}.}  The beauty of the qubit
concept, though, is that any information-theoretic development
realized in one physical system like polarization is immediately
avaiable in all other physical qubits like the spin state of
a trapped ion, the direction of current in a superconducting loop, the
magnetic moment of a hydrogen atom in nuclear magnetic resonance
or the spin state of an electron trapped on a quantum dot.

It is useful to label particular superpositions of the $\ket{H}$
and $\ket{V}$ basis states which often come up in discussions.  Following the
conventions for Jones vectors\cite{Jones1941} (and as used in \cite{James2001}), we define 
\begin{align}
\ket{D} &\equiv \frac{1}{\sqrt{2}}\left(\ket{H}+\ket{V}
\ket{A} &\equiv \frac{1}{\sqrt{2}}\left(\ket{H}-\ket{V}
\ket{L} &\equiv \frac{1}{\sqrt{2}}\left(\ket{H}+i\ket{V}
\ket{R} &\equiv \frac{1}{\sqrt{2}}\left(\ket{H}-i\ket{V}
\end{align}
Sometimes in the quantum information literature $\ket{L}$ and
$\ket{R}$ will have opposite signs to those given here.  

Along with $\ket{H}$ and $\ket{V}$, these states are eigenstates
of the Pauli operators defined as,
\begin{align}
\sigma_x=\left(\begin{array}{cc}0 & 1\\ 1 & 0\\ \end{array}\right)
\sigma_y=\left(\begin{array}{cc}0 & -i\\ i & 0\\ \end{array}\right)
,\sigma_z=\left(\begin{array}{cc}1 & 0\\ 0 & -1\\ \end{array}\right).
\end{align}
$\ket{H}$ and $\ket{V}$ are the $+1$ and $-1$ eigenstates of
$\sigma_z$.  $\ket{D}$ and $\ket{A}$ are the $+1$ and $-1$
eigenstates of $\sigma_x$. $\ket{L}$ and $\ket{R}$ are the $+1$
and $-1$ eigenstates of $\sigma_y$.  
\subsection{The Bloch sphere}
Since the state of a qubit must be normalized, a convenient way
of writing it is 
\begin{equation}
\cos \theta \ket{H}+\e^{i\phi}\sin\theta \ket{V}
\end{equation}
This parametrization of the qubit in terms of angles $\theta$
and $\phi$ is suggestive.  If we make the mapping
\begin{align}
x&=\sin 2\theta \cos \phi\\
y&=\sin 2\theta \sin \phi\\
z&=\cos 2\theta \\
\end{align}
Then by varying $\theta$ on $\left[0,\pi\right]$ and $\phi$ on
$\left[0,2\pi \right]$, then this fully parametrizes the unit
sphere.  This sphere provides a convenient visual representation
of the qubit.  Points that are anti-podal on the Bloch sphere represent
orthogonal states of the qubit.  Overlaps between states can be
calculated solely from the relative angle between the two
corresponding points on the Bloch sphere.    

Mixed states can also be represented, in this
description; they are point on the interior of the Bloch sphere
usually called the Bloch ball.
Let $\rho$ be a single-qubit density matrix 
\begin{equation}
\rho=\frac{1}{2}\left(\begin{array}1+z & x-iy\\ \x+iy & 1-z\end{array}\right)
\end{equation} 
It follows from the positivity constraint on density matrices
that $\text{det} \rho=\frac{1}{4}\left(1-|{\bf r}|^2\right)\geq
0$ where ${\bf r}$ is the real-space vector $\left(x, y, z\right)$.  Any
point satisfying this inequality, so as to lie on the surface
or within the interior of the Bloch sphere represents a valid density
matrix.  
\begin{figure}

\caption{The Bloch sphere}
\label{fig:BlochSphere}
\end{figure}

Before there was the Bloch sphere there was the
Poincar\'e sphere.  Invented in 1891 by Henri Poincar\'e, the
Poincar\'e sphere represents classical polarizations in exactly
the same way that the Bloch sphere represents qubits.  There is
a difference of convention between the two descriptions.  The
north pole of the Poincare sphere represents left-circular
polarization (i.e. the $\sigma_z$ eigenstate) whereas the north
pole of the Bloch sphere represents $\ket{H}$ (i.e. the
$\sigma_z$ $+1$ eigenstate).  The Poincare sphere has the
convenient feature that all linear polarization states
(i.e. those with a real coherence between $\ket{H}$ and
$\ket{V}$ are located in the equatorial plane.  In this
dissertation we will primarily make use of the Poincar\'e sphere
description of polarization states since it seems marginally
more natural and since it was there first.  To change to the
Bloch sphere picture the reader need only rotate his head
$90^\circ$ to the right.

\subsection{Qubit transformations}
The class of transformations that can be applied to a qubit
without changing its purity form a representation of the group
$SU(2)$.  They are most easily pictured as being rotations on
the Bloch/Poincar\'e sphere.  For this reason (and for brevity),
we often speak of polarization rotations as including the full
range of SU(2) transformations, not just rotations of linear polarizations.

For polarization, any SU(2) transformation can be performed by inducing a
series of relative phase delays between two orthogonal polarizations.   A phase
delay of $\phi$ about an axis at angle $\theta$ rotates a point
on the Poincar\'e sphere through the angle $\phi$ about an axis on
the equatorial plane making an angle of $\theta$ with the H/V
axis.  One popular method of making arbitrary polarization
transformations uses waveplates.  These are thin slices of
birefringent material meant to impart a fixed phase delay $\phi$ at a
variable angle $\theta$.  Typically half waveplates with
$\phi=\pi$ and quarter waveplates with $\phi=\pi/2$ are used,
but one occasionally encounters plates with $\phi=2\pi$ and
$\phi=\pi/4$.  

A half waveplate can take any linear polarization to a different
linear polarization since a rotation of a ray on the equatorial
plane about another ray on the equatorial plane by an angle
$\pi$ will result in a ray on the equatorial plane.  Quarter
waveplates rotate by $\pi/2$ and so can take a linear
polarization into an elliptical polarization anywhere on the
hemisphere of states elliptically polarized in the same direction as
the initial linear polarization.

A quarter waveplate and halfwaveplate together can take a linear
polarization to any point on the Poincar\'e sphere.  A
quarter-waveplate half-waveplate, quarter waveplate can perform
an aritrary rotation that takes any point on the Poincar\'e
sphere to any other point on the Poincar\'e sphere.  

While this is useful in principle, it can be difficult to enact in practice
since the parameters under control, the value of $\theta$
for each waveplate, are tightly coupled, making the system difficult
to fine-tune except for some very specific sets of angles.
A much more experimentally convenient polarization controller is
one that allows $\theta$ to be fixed and a variable $\phi$ to be
applied.  This is exactly the situation with liquid crystal
variable waveplates (LCWPs) that will be discussed in the next
chapter.

\section{Entanglement}
\begin{quote}
If two separated bodies, about which, individually, we have
maximal knowledge, come into a situation in which they influence
one another an then again separate themselves, then there
regularly arises that which I just called \emph{entanglement} of
our knowledge of the two bodies.  At the outset, the joint
catalogue of expectations consists of a logical sum of the
individual catalogues; during the process the joint catalogue
develops necessarily according to the known law\ldots Our
knowledge remains maximal, but at the end, if the bodies have
again separated themselves, that knowledge does not again
decompose into a logical sum of the knowledge of the individual bodies.
--Erwin Schr\:odinger
\end{quote}

The Hilbert space of two qubits is spanned by the tensor products of the
basis states of the individual qubits, namely $\ket{HH},
\ket{HV}, \ket{VH}, \ket{VV}$.  The density matrix has $4\times
4=16$ elements.  In this space it is easy to construct states
that cannot be written as the tensor product of two
single-particle states, and similarly for density matrices.
The particles making up such states are said to be
\emph{entangled}. More generally, any $n$-particle density matrix $\rho$ that
cannot be written as a $n$-fold tensor product of density
matrices for particles making it up is said to be entangled. 

Entanglement is viewed by many physicists as the strangest
aspect of quantum mechanics\cite{???}.  It's implications for
physical systems are profound because a system in an entangled
state cannot be described by specifying the properties of the
individual particles it contains.  Indeed, in a very real sense
such a system has no individual particle properties.  All the information
about the state is contained not in the individual particles,
but in the correlations between them.  From the point of view of
quantum information science it is one of the main properties
that make quantum information different from classical information.

For example, a state like the following
two-photon polarization state
\begin{align}
\ket{\phi^+}&=\frac{1}{\sqrt{2}}\left(\ket{HH}+\ket{VV}\right)\\
&=\frac{1}{\sqrt{2}}\left(\ket{DD}+\ket{AA}\right)\\
&=\frac{1}{\sqrt{2}}\left(\ket{RL}+\ket{LR}\right)\\
\end{align}
is entangled.  Each photon individually has an
equal probability of being horizontally polarized or vertically
polarized, left or right circularly polarized or diagonally or
antidiagonally polarized.  In classical polarization theory a
beam of light having this property would be considered
unpolarized.  The quantum state, though, also has correlations.
The two photons in the state will have the same polarization
when measured in the horizontal/vertical and
diagonal/anti-diagonal bases and will have opposite
polarizations in the left and right circular bases.  It was
shown by John Bell in 1962(?)\cite{Bell's inequalities} that
the complete randomness of individual particle measurements coupled
with the perfect correlations between the single-particle
measurements are inconsistent with classical probability theory,
and local realism.  

A particular basis of two-photon
polarization states that includes $\ket{\phi^+}$ are called Bell
states in his honour.  Reference will be made to them often
in this thesis so we list them here.
\begin{align}
\ket{\phi^+}=\frac{1}{\sqrt{2}}\left(\ket{HH}+\ket{VV}\right)\\
\ket{\phi^-}=\frac{1}{\sqrt{2}}\left(\ket{HH}-\ket{VV}\right)\\
\ket{\psi^+}=\frac{1}{\sqrt{2}}\left(\ket{HV}+\ket{VH}\right)\\
\ket{\psi^-}=\frac{1}{\sqrt{2}}\left(\ket{HV}\ket{VH}\right)\\
\end{align}

From a mathematical point of view, entanglement is an expression
of the impossibility of factoring the density matrix describing
a system on the full Hilbert space of the system into density
operators on the subsystems that go into making up the system.  

While the problem of characterizing entanglement for states of
more than two particles is complex, the two particle case is
relatively straightforward.  If the state of the whole system is
pure, then one may characterize the degree of entanglement by
performing a partial trace over one of the particles and
measuring the purity of the other particle's density matrix.
Equivalently one may compare the von Neumann entropy\cite{VonNeumann} of the
density matrix describing the composite system to those
describing the individual subsystems.  One may also calculate the Schmidt decomposition of the state
and examine the magnitude of the largest Schmidt
coefficient\cite{MikeandIke}.

For mixed states one can may compare the von Neumann entropy\cite{VonNeumann} of the
density matrix describing the composite system to those
describing the individual subsystems.  the von Neumann
entropy\cite{VonNeumann}, defined as $\text{Tr}\left[ \rho ln
  \rho \right]$ of the
density matrix describing the composite system to those
describing the individual subsystems.   A more operationally
grounded measure of entanglement is the concurrence introduced
by Hill and Wootters\cite{Hill1997}.  
\begin{align}
\mathcal{C}(\rho)\equiv\max(0,\sqrt{\lambda_1}-\sqrt{\lambda_2}-\sqrt{\lambda_3}-\sqrt{\lambda_4}\\
\end{align}
where 
in which $\lambda_1$,\ldots,$\lambda_4$ are the eigenvalues of
$\sigma_y$. 

\section{Distance measures}
A common problem in quantum information science, particularly in
experimental work, is to try
to characterize how close two states described by density
matrices $\rho$ and $\sigma$ are to each other.  

\section{Quantum Measurement}
Traditionally, measurements in quantum mechanics have been
described by projectors.  These are orthogonal, rank-1,
idempotent, Hermitian operators.  A projective value measure or
PVM is a set of projectors that sum to unity.  

The most general measurements, though, are the positive operator
value measures or POVMs.  A POVM is a set of
positive-semidefinite operators that sum to identity.  The
operators need not be orthogonal and the size of the set may be
larger than the dimension of the Hilbert space of states.

In measurements of polarization, we generally model measurements
as PVMs.  The POVM formalism can be useful, though, in modeling
experimental errors in certain kinds of two-photon measurement.
This will be discussed further in chapter 2.


\subsection{Entangling measurements}
Quantum mechanics allows for a multiparticle system to be
measured in an entangled basis.  This is called an entangling
measurement.  Entangling measurements were crucial to
demonstrating the quantum teleportation protocol\cite{Teleportation} and other
quantum protocols\cite{QuantumFingerprinting}.    

They have the interesting property that they determine the
strength of correlations between 
single-particle states without collapsing the particles onto
single particle states.  In telelportation such a measurement
erases single-qubit information on one particle and causes it to
appear, up to a known rotation, on another particle.  

The ability of entangling measurements to probe correlation information in
multiple bases at once will allow us in Chapter 5 to use them to
efficiently characterize a state's purity.  In chapter 4, we
will find that entangling measurements are essential to being
able to probe information about a system in an informationally
symmetric way.  This will in turn allow us to optimize our
approach to quantum state estimation.

\section{Summary}
We have discussed some of the major concepts in quantum
information research necessary to understand the experiments
presented in this thesis.  In particular, density-matrix
formalism, the qubit paradigm, entanglement and the role of
quantum measuremments have been discussed.  The next chapter
will discuss general methodology and experimental concerns
before theoretical and experimental results are discussed in
chapters 3 to 6.






















\ignore{\section{Quantum state estimation}
A central problem in experimental quantum information science
is, given a source of quantum states described by a density
matrix $\rho$, estimate $\rho$ from a set of measurements
performed on the source.  As was alluded to earlier, since
$\rho$ is an observable, it should always be possible to
decompose it into a set of Hermitian operators that can be
measured.  So long as enough measurements are taken to get a
good estimate of the expectation values of the operators in the
decomposition, it will be possible to obtain a good estimate of
the density matrix.  A set of expectation values that contains
sufficient information to unambiguously reconstruct the density
matrix is called \emph{informationally complete}.

For polarization qubits, the 

\subsection{Linear inversion}
The simplest and most intuitive approach to quantum state
estimation is to regard the mapping of experimental measurements
to a density matrix as a particular kind of matrix inversion.
Linear inversion is possible because Hermitian operators form a
Hilbert space, and so the density matrix will be related to a
complete set of measurement expectation values through a linear
map (i.e. a matrix) that is a function only of the particular
measurement operators that were implemented.  The major downside
to this approach is that it does not allow one to easily take
account of the positivity constraint on the density matrix.
It is still important because it is the only available analytic
inversion tool which makes it important for conceptual
understanding of state estimation and as a means of calculating
error propagation without resorting to Monte Carlo techniques.  

Suppose we measure $m$ measurement operators $\hat{P}_m$ on a
$d$-dimensional Hilbert space.  These operators need to be
complete in order to the inversion to work, but they need not
all be linearly independent -- $m$ can exceed $d^2$.  For each measurement operator
we can construct a vector by matrix \emph{flattening}, i.e. by
constructing a $d^2$-dimensional vector by concatenating the
rows of $\hat{P}_m$.  We now construct the $m \times d^2$ matrix
${\bf M}$ by whose $m^\text{th}$ row will be the flattened projector
$\hat{P}_m$.  Let $s_m$ be the measured expectation values, then
from equation \ref{eq:expectation_values} we have that 
\begin{align}
s_m&=\text{Tr}\left[\rho \hat{P}_m\right]\\
&=\sum_{i=1}^d\sum_{j=1}^d \rho_{ij}\hat{P_{ji}}_m
\end{align}
or, treating $s_m$ as a $m$-element column vector \vec{s},
\begin{equation}
\vec{s}={\bf M}\vec{\rho}
\end{equation} 
where $\vec{\rho}$ is the flattened version of $\rho$.

Applying a standard technique from linear regression theory we
can multiply both sides of this equation by ${\bf M}^T$ to obtain 
\begin{equation}
{\bf M}^T \vec{s}={\bf M}^T {\bf M}\vec{rho}
\end{equation} 
Since ${\bf M}$ is $m \times d^2$, ${\bf M}^T$ is $d^2 \times
m$, so ${\bf M}^T \vec{s}$ is a d^2 dimensional vector and ${\bf
  M}^T {\bf M}$ is a $d^2 \times d^2$ square matrix whose
inverse, $\left({\bf
  M}^T {\bf M}\right)^{-1}$ allows us to solve the equation for $\vec{\rho}$
\begin{equation}
\left({\bf
  M}^T {\bf M}\right)^{-1}{\bf M}^T \vec{s}=\vec{rho}
\end{equation} 
If the problem is exactly constrained so that $m=d^2$ then
$\rho$ is exactly consistent with all the data in $\vec{s}$.  If
the problem is over-constrained so the $m>d^2$, then $\rho$
represents the density matrix that minimizes the least-squares
estimator, i.e. the estimator that minimizes the summed variance
between the values of $\hat{s}_m$ predicted from $\rho$ and the
measured $s_m$.

Unfortunately, because no positivity constraint was imposed, it
is possible that the estimated $\rho$, which we will call $\hat{\rho}$ will have negative
eigenvalues.  This makes it impossible to calculate quantities
from $\hat{\rho}$ that depend on its positivity, most notably
the entanglement-related properties like concurrence and the
fidelity with other states.  This fact makes it usually
necessary to employ a more sophisticated methodology like
maximum-likelihood fitting to be discussed in the next section.  It should be noted, though, that if
the density matrix obtained from linear fitting is positive,
then it will also be the maximum-likelihood estimator for the
least-squares likelihood function.

Another less well-motivated approach to dealing with the
negative eigenvalues that often come up in linear inversion is
simply to replace them with zeros.  This can be done by
diagonalizing the density matrix, replacing the negative
eigenvalues with zeros and rotating back to the original basis.
This was sometimes used as a computationally quick means of
estimating the density matrix but was found to diverge
significantly from the maximum-likelihood estimate, especially
for pure states and poor statistics.  

\subsubsection{Error analysis}
Because the linear inversion is analytic, it can give real
insight into how errors propagate from estimated expectation
values to the density matrix.  In photon counting experiments
such as the ones discussed in this thesis, non-systematic errors
arise largely due to the counting statistics.
Although bosonic stimulation can sometimes lead to
super-Poissonian statistics counting, in the experiments examined here
where the probability of obtaining more than one pair within a
photon coherence time is vanishingly small, pair creation can be
regarded as completely independent and so statistics are very
well modeled with a Poissonian distribution.  This can be seen
from experimental evidence in Figure
\ref{fig:Poissonian_statistics}.

Expectation values are estimated from counts by dividing the
number of counts for a given PVM element by the total number for
a complete PVM within the same time period.  For Poissonian
statistics the variance is equal to the mean, so 


\subsection{Maximum likelihood estimation}
If one wants to take the positivity constraint into account then
one is forced to adopt numerical techniques to try to find the
best fit density matrix to explain a given data set.  The
approach taken from standard model-fitting methods is called
maximum-likelihood fitting.  One develops a model to 
}
